巴西专利BR112013010258B1 apparatus and method for deriving directional information and systems

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
An apparatus for deriving directional information from a plurality of microphone signals or a plurality of components of a microphone signal, in which different effective microphone visual directions are associated with the microphone signals or components, comprises a combiner configured to obtain a magnitude value of a microphone signal or a component of the microphone signal. The combiner is further configured to combine direction information items that describe the effective microphone visual directions, so that a direction information item, which describes a given effective microphone visual direction, is weighted depending on the magnitude value of the microphone signal, or the microphone signal component, associated with a given effective microphone visual direction, to derive directional information.
公开号:BR112013010258B1
申请号:R112013010258-6
申请日:2011-10-26
公开日:2020-12-29
发明作者:Fabian KÜCH；Giovanni Delgaldo；Oliver Thiergart；Ville Pulkki；Jukka Ahonen
申请人:Fraunhofer Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.；
IPC主号:

专利说明:

TECHNICAL FIELD
The embodiments of the present invention relate to an apparatus for deriving directional information from a plurality of microphone signals or from a plurality of components of a microphone signal. Additional achievements refer to systems comprising this device. Additional achievements refer to a method for deriving directional information from a plurality of microphone signals. HISTORY OF THE INVENTION
Spatial sound recording aims to capture a sound field with multiple microphones, so that, on the reproduction side, a listener perceives the sound image, as it was present at the recording location. Standard approaches to spatial sound recording use conventional stereo microphones or more sophisticated combinations of directional microphones, for example, such as the B-format microphones used in Ambisonics (MA Gerzon. Periphony, Width-height sound reproduction, J. Audio Eng. Soc., 21 (1): 2-10, 1973). Most of these methods are commonly referred to as coincident microphone techniques.
Alternatively, methods based on a parametric representation of sound fields can be applied, which are referred to as parametric spatial audio encoders. These methods determine one or more downmix audio signals along with the corresponding additional parallel information, which is relevant to the perception of spatial sound. Examples are Directional Audio Coding (DirAC), as discussed in V. Pulkki, Spatial sound reproduction with directional audio coding, J. Audio Eng. Soc., 55 (6): 503-516, June 2007, or the so-called approach of spatial audio microphones (SAM) proposed in C. Faller, Microphone front-ends for spatial audio coders. In 125th AES Convention, Paper 7508, San Francisco, Oct. 2008. Spatial indication information is determined in frequency sub-bands and basically consists of the direction of arrival (DOA) of the sound and, sometimes, the diffusion of the sound field or other statistical measures. In a synthesis stage, the desired speaker signals for reproduction are determined based on the downmix signals and the parametric parallel information.
In addition, for spatial audio recording, parametric approaches to sound field representations have been used in applications, such as directional filtration (M. Kallinger, H. Ochsenfeld, G. Del Galdo, F. Kuech, D. Mahne, R. Schultz-Amling, and O. Thiergart, A spatial filtering approach for directional audio coding, in 126th AES Convention, Paper 7653, Munich, Germany, May 2009) or source location (0. Thiergart, R. Schultz-Amling, G. Del Galdo, D. Mahne, and F. Kuech, Localization of sound sources in reverberant environments based on directional audio coding parameters, in 128th AES Convention, Paper 7853, New York City, NY, USA, Oct. 2009). These techniques are also based on directional parameters, such as DOA of sound or diffusion of the sound field.
One way to estimate the directional information of the sound field, namely, the direction of arrival of the sound, is to measure the field at different points with an array of microphones. Several approaches have been proposed in the literature J. Chen, J. Benesty, and Y. Huang, Time delay estimation in room acoustic environments: An overview, in EURASIP Journal on Applied Signal Processing, Article ID 26503, 2006, which uses estimates of relative time between microphone signals. However, these approaches make use of the phase information of the microphone signals, inevitably leading to spatial aliasing. In fact, the higher the frequencies being analyzed, the wavelength becomes shorter. At a given frequency, called the aliasing frequency, the wavelength is such that identical phase readings correspond to two or more directions, so that an ambiguous estimate is not possible (at least, without additional a priori information).
There is a wide variety of methods for estimating the DOA of sound using microphone sets. An overview of common approaches is summarized in J. Chen, J. Benesty, and Y. Huang, Time delay estimation in room acoustic environments: An overview, in EURASIP Journal on Applied Signal Processing, Article ID 26503, 2006. These approaches have in common, they explore the phase relationship of microphone signals to estimate the DOA of the sound. Generally, the time difference between different sensors is determined first, and then knowledge of the geometry of the array is exploited to compute the corresponding DOA. Other approaches assess the correlation between different microphone signals in frequency sub-bands to estimate the DOA of the sound (C. Faller, Microphone front-ends for spatial audio coders, in 125th AES Convention, Paper 7508, San Francisco, Oct. 2008 and J. Chen, J. Benesty, and Y. Huang, Time delay estimation in room acoustic environments: An overview, in EURASIP Journal on Applied Signal Processing, Article ID 26503, 2006).
In DirAC, the DOA estimate for each frequency range is determined based on the active sound intensity vector measured in the observed sound field. Below, the estimation of directional parameters in DirAC is briefly summarized. Let us denote P (k, n) the sound pressure and U (k, n) the particle velocity vector in the frequency index k and time index n. Then, the active sound intensity vector is obtained as

The superscript * denotes the conjugate complex and Re {} is the real part of a complex number. pc represents the average density of the air. Finally, the opposite direction of Ia (k, n) points to the DOA of the sound:

Additionally, the diffusion of the sound field can be determined, for example, according to

In practice, the particle velocity vector is computed from the pressure gradient of narrowly spaced omnidirectional microphone capsules, generally referred to as differential microphone sets. Considering Figure 2, the x component of the particle velocity vector can, for example, be computed using a pair of microphones, according to
Ur (pn) - K (k) [Pi (k, n) - P2 (k, n)], (4) where K (k) represents a frequency-dependent normalization factor. Its value depends on the configuration of the microphone, for example, the distance from the microphones and / or their directivity standards. The remaining components Uy (k, n) (and Uz (k, n)) of U (kn) can be determined in a similar way by combining suitable pairs of microphones.
As presented in M. Kallinger, F. Kuech, R. Schultz-Amling, G. Del Galdo, J. Ahonen, and V. Pulkki, Analysis and Adjustment of Planar Microphone Arrays for Application in Directional Audio Coding, in 124th AES Convention, Paper 7374, Amsterdam, the Netherlands, May 2008, spatial aliasing affects the phase information of the particle velocity vector, prohibiting the use of pressure gradients for the estimation of active sound intensity at high frequencies. This spatial aliasing generates ambiguities in DOA estimates. As can be seen, the maximum frequency fmx, at which unambiguous DOA estimates can be obtained based on the intensity of the active sound, is determined by the distance between the microphone pairs. Additionally, the estimation of directional parameters, such as the diffusion of a sound field, is also affected. In the case of omnidirectional microphones with a distance d, this maximum frequency is given by
where c denotes the speed of sound propagation.
Typically, the required frequency range of applications that exploit directional sound field information is greater than the spatial aliasing limit fmax to be expected for the practical microphone configuration. Note that reducing the microphone spacing d, which increases the fmax spatial aliasing limit, is not a viable solution for most applications, since a very small d significantly reduces the reliability of the low frequency estimate in practice. Thus, new methods are needed to overcome the limitations of current techniques for estimating directional parameters at high frequencies. 3. SUMMARY OF THE INVENTION
It is an objective of the realizations of the present invention to create a concept, which allows a better determination of directional information above a limit frequency of spatial aliasing.
This objective is solved by an apparatus, according to claim 1, systems, according to claims 15 and 16, a method, according to claim 18 and a computer program, according to claim 19.
The embodiments provide an apparatus for deriving directional information from a plurality of microphone signals or from a plurality of components of a microphone signal, in which different efficient microphone visual directions are associated with the microphone signals or components. The apparatus comprises a combiner configured to obtain a magnitude of a microphone signal or a component of the microphone signal. In addition, the combiner is configured to combine (for example, linearly combine) the direction information items that describe the efficient microphone visual direction, so that a direction information item that describes a particular efficient microphone visual direction is weighted depending on the magnitude value of the microphone signal, or the component of the microphone signal, associated with the given efficient microphone visual direction, to derive directional information.
It was found that the problem of spatial aliasing in directional parameter estimation results from ambiguities in the phase information within the microphone signals. It is an idea of the achievements of the present invention to overcome this problem by deriving directional information based on the magnitude values of the microphone signals. It was found that, when deriving directional information based on the magnitude values of the microphone signals or components of the microphone signals, ambiguities, since they can occur in traditional systems that use the phase information to determine the directional information , do not occur. Thus, the achievements allow a determination of directional information even above a limit of spatial aliasing, above which a determination of directional information is not possible (or only with errors) using phase information. magnitude of microphone signals or microphone signal components is especially beneficial within frequency regions in which spatial aliasing or other phase distortions are expected, since these phase distortions have no influence on the magnitude values and, therefore, they do not lead to ambiguities in determining directional information.
According to some realizations, an efficient microphone visual direction associated with a microphone signal describes the direction in which the microphone, from which the microphone signal is derived, has its maximum response (or its highest sensitivity). As an example, the microphone can be a directional microphone that has a non-isotropic gripping pattern and the efficient microphone visual direction can be defined as the direction in which the microphone grasping pattern has its maximum. Thus, for a directional microphone, the visual direction of the efficient microphone can be equal to the visual direction of the microphone (which describes the direction to which the directional microphone has the maximum sensitivity), for example, when no object that modifies the pattern of apprehension of the directional microphone is placed close to the microphone. The visual direction of an efficient microphone can be different from the visual direction of the directional microphone microphone, if the directional microphone is placed next to an object that has the effect of modifying its apprehension pattern. In this case, the visual direction of the efficient microphone can describe the direction in which the directional microphone has its maximum response.
In the case of an omnidirectional microphone, an efficient response pattern of the omnidirectional microphone can be formed, for example, using a shading object (which has an effect of the modifying effect of the microphone's apprehension pattern), so that the formed efficient response has an efficient microphone visual direction that is the direction of the maximum response of the omni-directional microphone with the formed efficient response pattern.
According to additional achievements, directional information can be directional information from a sound field that points in the direction from which the sound field is propagating (for example, at certain frequency and time indices). The plurality of microphone signals can describe the sound field. According to some realizations, an item of direction information that describes a given efficient microphone visual direction can be a vector that points to the given efficient microphone visual direction. According to the additional realizations, the direction information items can be unit vectors, so that the direction information items associated with the different efficient microphone visual directions have the same standards (but different directions). Therefore, a weighted vector standard, linearly combined by the combiner, is determined by the magnitude value of the microphone signal or the microphone signal component associated with the weighted vector direction information item.
According to the additional realizations, the combiner can be configured to obtain a magnitude value, so that the magnitude value describes a magnitude of a spectral coefficient (as a component of the microphone signal) that represents a spectral subregion of the signal. microphone of the microphone signal component. In other words, the achievements can extract real information from a sound field (for example, analyzed in a time frequency domain) from the magnitudes of the microphone spectra used to derive the microphone signals.
According to the additional realizations, only the magnitude values (or magnitude information) of the microphone signals (or the microphone spectra) are used in the estimation process to derive directional information, since the phase period is corrupted by the effect of spatial aliasing.
In other words, the achievements create a device and a method for estimating directional parameter using only the magnitude information of microphone signals or components of the microphone signals and the spectrum, respectively.
According to the additional realizations, the output of the magnitude-based directional parameter estimate (the directional information) can be combined with other techniques that also consider the phase information.
According to the additional embodiments, the magnitude value can describe a magnitude of the microphone or component signal. 4. BRIEF DESCRIPTION OF THE FIGURES
The embodiments of the present invention will be described in detail using the accompanying figures, in which: Figure 1 shows a schematic block diagram of an apparatus according to an embodiment of the present invention; Figure 2 shows an illustration of a microphone configuration using four omnidirectional capsules; providing sound pressure signals Pj (k, n) with i = 1,. . . , 4; Figure 3 shows an illustration of a microphone configuration using four directional microphones with cardioid apprehension patterns; Figure 4 shows an illustration of a microphone configuration that employs a rigid cylinder to cause dispersion and shading effects; Figure 5 shows an illustration of a microphone configuration similar to Figure 4, but which employs a different microphone placement; Figure 6 shows an illustration of a microphone configuration that employs a rigid hemisphere to cause scattering and shading effects; Figure 7 shows an illustration of a 3D microphone configuration that employs a rigid sphere to cause shading effects; Figure 8 shows a flow chart of a method, according to one embodiment; Figure 9 shows a schematic block diagram of a system according to an embodiment; Figure 10 shows a schematic block diagram of a system according to a further embodiment of the present invention; arrangement of four omnidirectional microphones with d spacing between opposing microphones; Figure 12 shows an illustration of an arrangement of four omnidirectional microphones, which are mounted on the end of a cylinder; Figure 13 presents a diagram of a directivity index D1 in decibels as a function of ka, which represents a diaphragm circumference of an omnidirectional microphone divided by the wavelength; Figure 14 shows directional logarithmic patterns with G.R.A.S microphone .; Figure 15 shows directional logarithmic patterns with AKG microphone; and Figure 16 shows diagram results for direction analysis expressed as mean square root error (RMSE). Before the embodiments of the present invention are described in more detail using the accompanying figures, it should be noted that elements of the same or equal functionality are provided with the same reference numbers and that a repeated description of the elements provided with the same reference numbers is omitted. Consequently, the descriptions provided for elements with the same reference numbers are mutually interchangeable. 5. DETAILED DESCRIPTION OF THE ACCOMPLISHMENTS OF THE PRESENT INVENTION 5.1 APPARATUS, ACCORDING TO FIGURE 1 Figure 1 shows an apparatus 100, according to an embodiment of the present invention. The apparatus 100 for deriving directional information 101 (also denoted as d (k, n)) from a plurality of microphone signals 103x to 103N (also denoted as Pi to PN) or from a plurality of components of a microphone signal comprises a combiner 105. Combiner 105 is configured to obtain a magnitude value from a microphone signal or a microphone signal component, and to linearly combine the direction information items that describe the efficient microphone visual directions that are associated with the microphone signals 103i to 103N or components, so that an item of direction information describing a given efficient microphone visual direction is weighted depending on the magnitude value of the microphone signal, or the associated microphone signal component to the given visual direction of an efficient microphone to derive directional information 101.
A component of an i-th microphone signal Pi can be denoted as Pi (k, n). The component PHk, n) of the microphone signal Pi can be a value of the microphone signal Pi in the frequency index k and time index n. The microphone signal Px can be derived from an i-nth microphone and can be available to combiner 105 in the time frequency representation comprising a plurality of components Pi (k, n) for different frequency indices k and time indices n. As an example, the microphone signals Pi to PN can be sound pressure signals, since they can be derived from B-Format microphones. Therefore, each PHk component, n) can correspond to a time frequency cutout (k, n). The combiner 105 can be configured to obtain the magnitude value, so that the magnitude value describes a magnitude of a spectral coefficient that represents a spectral subregion of the microphone signal Pi. This spectral coefficient can be a component Pi (k, n) of the microphone signal Pi. The spectral subregion can be defined by the frequency index k of the Pi component (k, n). In addition, combiner 105 can be configured to derive directional information 101 based on a time frequency representation of microphone signals, for example, in which a microphone signal Pi is represented by a plurality of components Pi (k, n), each component being associated with a time frequency cutout (k, n).
As described in the introductory part of that application, when obtaining directional information d (k, n) based on the magnitude values of microphone signals Pi a PN or components of a microphone signal, a determination of directional information d (k, n) even more frequently for microphone signals Pi to PN, for example, for components Pi (k, n) to PN (k, n) having a frequency index above a frequency index of the spectral aliasing frequency fmax , can be achieved, since spatial aliasing or other phase distortions cannot occur.
The following is a detailed example of an embodiment of the present invention, which is based on a combination of the magnitudes of the microphone signals (combination of directional magnitude), and how it can be performed by the apparatus 100, according to Figure 1 Directional information d (k, n), also denoted as a DOA estimate, is obtained by interpreting the magnitude of each microphone signal (or each component of a microphone signal) as a corresponding vector in a two-dimensional space (2D ) or three-dimensional (3D).
Let dt (k, n) be the actual or desired vector that points to the direction from which the sound field is propagating in frequency and time indices k and n, respectively. In other words, the DOA of the sound corresponds to the direction of dt (k, n). The estimate dt (k, n), so that the directional information of the sound field can be extracted, is the objective of the realizations of the invention. Let also that b2, b2,. . . , bN are vectors (for example, unit norm vectors) that point to the visual direction of the N directional microphones. The visual direction of a directional microphone is defined as the direction in which the apprehension pattern has its maximum. Similarly, in the case of scattering / shading objects being included in the microphone configuration, vectors b2, b2,. . . , bN point in the direction of the maximum response of the corresponding microphone.
The vectors b, b2,. . . , bN can be designated as direction information items that describe efficient microphone visual directions from the first to the Nth microphone. In this example, the direction information items are vectors that point in corresponding efficient microphone visual directions. According to the additional realizations, an item of direction information can also be a scalar, for example, an angle that describes a visual direction of a corresponding microphone.
In addition, in this example, the direction information items can be unit norm vectors, so that the vectors associated with different efficient microphone visual directions have equal norms.
It should also be noted that the proposed method can work best if the sum of the bif vectors corresponding to the microphone's efficient microphone visual directions is equal to zero (for example, within a tolerance range), that is,

In some embodiments, the tolerance variation can be ± 30%, ± 20%, ± 10%, ± 5% of one of the direction information items used to derive the sum (for example, from the direction information item having the highest standard of the direction information item having the lowest standard, or of the direction information item having the standard closest to the average of all the standards of the direction items used to derive the sum).
In some embodiments, the efficient microphone visual directions may not be evenly distributed in relation to a coordinate system. For example, assuming a system in which a first efficient microphone visual direction of a first microphone is EAST (for example, 0 degrees in a two-dimensional coordinate system), a second efficient microphone visual direction of a second microphone is NORTHEAST (for example, example 45 degrees in the two-dimensional coordinate system), a third visual direction of the microphone of a third microphone is NORTH (for example, 90 degrees in the two-dimensional coordinate system), and a fourth visual direction of the efficient microphone of a fourth microphone is SOUTHWEST ( for example, -135 degrees in the two-dimensional coordinate system), having the direction information items that are unit norm vectors would result in: b2 = [1 0] T for the first efficient microphone visual direction; b2 = [1/72 1/72] T for the second visual direction of an efficient microphone; b3 = [0 1] T for the third visual direction of an efficient microphone; and b4 = [-1/72 -I / V2P for the fourth visual direction of an efficient microphone.
This would lead to a nonzero sum of the vectors of: bsoma = b1 + b2 + b3 + bi! = [1 1] T.
As in some embodiments, it is desired to have a sum of the vectors being zero, a direction information item that is a vector that points to an efficient microphone visual direction can be scaled. In this example, the direction information item b4 can be scaled, as: b4 = [- (1 + 1/72) - (I + I / 72)] T resulting in a sum of vectors bsoma that is equal to zero: bsoma = bi + b2 + b3 + b4 = [0 0] T.
In other words, according to some achievements, different items of direction information that are vectors that point in different efficient microphone visual directions can have different rules, which can be chosen so that a sum of the direction information items is equal to zero.
The estimate d of the real vector dt (k, n) and, therefore, the directional information to be determined can be defined as
where Pi (k, n) denotes the signal of the i-nth microphone (or the microphone signal component Pi of the i-nth microphone) associated with the frequency clipping (k, n).
Equation (7) forms a linear combination of the direction information items a bN from a first microphone to an Nth microphone weighted by magnitude values of components Pi (k, n) to PN (k, n) of microphone P: the PN derived from the first to the Nth microphone. Therefore, combiner 105 can calculate equation (7) to derive directional information 101 (d (k, n)).
As can be seen from equation (7), combiner 105 can be configured to linearly combine direction information items b∑ to bN weighted depending on the magnitude values that are associated with a given time frequency cutout (k, n), in order to derive directional information d (k, n) for the given time frequency cutout (k, n).
According to the additional realizations, the combiner 105 can be configured to linearly combine the direction information items b to bN weighted only depending on the magnitude values that are associated with the given time frequency cutout (k, n).
In addition, from equation (7), it can be seen that combiner 105 can be configured to linearly combine, for a plurality of different time frequency cuts, the same bi-bN directional information items (since they are independent of time frequency cutouts) that describe different efficient microphone visual directions, but the direction information items can be weighted differently depending on the magnitude values associated with the different time frequency cutouts.
Since the direction information items bT to bN can be unit vectors, a weighted vector norm that is formed by multiplying a bi direction information item and a magnitude value can be defined by the magnitude value. Weighted vectors for the same efficient microphone visual direction, but different time frequency clippings, may have the same direction, but differ in their standards due to different magnitude values for different time frequency clippings.
According to some realizations, the weighted values can be scalar values.
The K factor presented in equation (7) can be chosen freely. In the case where K = 2 and opposite microphones (from which the microphone signals Pi to PN are derived) are equidistant, the directional information d (k, n) is proportional to the energy gradient at the center of the array (for example, in a set of two microphones).
In other words, combiner 105 can be configured to obtain magnitude squared values based on magnitude values, a magnitude squared value that describes a power of a component Pi (k, n) of a microphone signal Pi. In addition, combiner 105 can be configured to linearly combine direction information items bi to bN, so that a direction information item bi is weighted depending on the magnitude squared value of component Pi (k, n) of the microphone signal Pi associated with the corresponding visual direction (of the i-nth microphone).
From d (k, n), the directional information expressed with azimuth angles cp and elevation <9 is easily obtained considering that

In some applications, when only 2D analysis is required, four directional microphones, for example, arranged as in Figure 3, can be used. In this case, the direction information items can be chosen as:
so that (7) becomes

This approach can similarly be applied to rigid objects placed in the microphone configuration. As an example, Figures 4 and 5 illustrate the case of a cylindrical object placed in the middle of an array of four microphones. Another example is shown in Figure 6, where the dispersion object is shaped like a hemisphere.
An example of a configuration 30 is shown in Figure 7, where six microphones are distributed along a rigid sphere. In this case, the z component of the vector d (k, n) can be obtained in a similar way to (9) - (14):
Producing

A well-known 3D configuration of directional microphones that is suitable for application in the realizations of this invention is the so-called A-format microphone, as described in P.G. Craven and M.A. Gerzon, US4042779 (A), 1977.
In order to follow the proposed directional magnitude combination approach, certain claims need to be met. If directional microphones are used, then, for each microphone, the apprehension patterns should be approximately symmetrical in relation to the orientation or visual direction of the microphones. If the scatter / shade approach is used, then scatter / shade effects should be approximately symmetrical in relation to the direction of the maximum response. These claims are easily met when an arrangement is built, as in the examples shown in Figures 3 to 7. DirAC APPLICATION
The above discussion considers the estimation of directional information (the DOA) only. In the context of directional coding, information on the diffusion of a sound field may additionally be required. A direct approach is obtained by simply equating the estimated vector d (k, n) or the directional information determined with the opposite direction of the active sound intensity vector Ia (k, n):

This is possible, since d (k, n) contains information related to the energy gradient. Then, the diffusion can be computed, according to (3). 5.2. METHOD, ACCORDING TO FIGURE 8
The additional embodiments of the present invention create a method for deriving directional information from a plurality of microphone signals or a plurality of components of a microphone signal, in which different efficient microphone visual directions are associated with the microphone signals.
This method 800 is presented in a flow chart in Figure 8. Method 800 comprises a step 801 of obtaining a magnitude of a microphone signal or a component of the microphone signal.
In addition, method 800 comprises step 803 of combining (for example, linear combining) direction information items that describe efficient microphone visual directions, so that a direction information item describing a particular visual direction of efficient microphone is weighted depending on the magnitude value of the microphone signal or the component of the microphone signal associated with the corresponding visual direction of the efficient microphone, to derive directional information.
Method 800 can be performed by apparatus 100 (for example, combiner 105 of apparatus 100).
Next, two systems, according to the realizations, can be described to acquire the microphone signals and derive directional information from those microphone signals using Figures 9 and 10. 5.3 SYSTEMS ACCORDING TO FIGURE 9 AND FIGURE 10
As commonly known, using pressure magnitude to extract directional information is not practical when using omnidirectional microphones. In fact, the magnitude differences due to the different distances traveled by the sound reaching the microphones are usually too small to be measured, so that most of the known algorithms depend mainly on the phase information. The achievements overcome the spatial aliasing problem in the directional parameter estimation. The systems described below make use of properly designed microphone arrangements, so that there is a desirable magnitude difference in the microphone signals, which is dependent on the direction of arrival. (Only) This magnitude information from the microphone spectra is then used in the estimation process, since the phase period is corrupted by the spatial aliasing effect.
The achievements include the extraction of directional information (such as DOA or diffusion) from a sound field analyzed in a frequency and time domain only from the magnitudes of the spectra of two or more microphones, or from a microphone subsequently placed in two or more positions, for example. example, by making a microphone swivel. This is possible when the magnitudes vary sufficiently strongly in a predictable manner, depending on the direction of arrival. This can be achieved in two ways, namely, 1. by employing directional microphones (ie, which have a non-isotropic apprehension pattern, such as cardioid microphones), where each microphone points in a different direction, or 2. when performing, for each microphone or microphone position, a unique dispersion and / or shading effect. This can be achieved, for example, by employing a physical object in the center of the microphone configuration. Suitable objects modify the magnitudes of the microphone signals in a known way, through dispersion and / or shading effects.
An example of a system that uses the first method is shown in Figure 9. 5.3.1 SYSTEM USING DIRECTIONAL MICROPHONES, ACCORDING TO FIGURE 9
Figure 9 shows a schematic block diagram of a system 900, the system comprises an apparatus, for example, apparatus 100, according to Figure 1. In addition, system 900 comprises a first directional microphone 901 ^ having a first 903x efficient microphone visual direction to derive a first microphone signal 1032 from the plurality of microphone signals from the apparatus 100. The first microphone signal 1032 is associated with the first visual direction 9032. In addition, system 900 comprises a second directional microphone 9012 having an efficient second microphone visual direction 9032 to derive a second microphone signal 1032 from the plurality of microphone signals from the apparatus 100. The second microphone signal 1032 is associated with the second visual direction 9032. In addition, the first visual direction 9032 is different from the second visual direction 9032. For example, visual directions 9032, 9032 can be opposite. An additional extension to this concept is shown in Figure 3, where four cardioid microphones (directional microphones) are pointed in opposite directions from a Cartesian coordinate system. The microphone positions are marked by black circuits.
When applying directional microphones, it can be achieved that differences in magnitude between the 9012, 9012 directional microphones are large enough to determine directional information 101.
An example of a system that uses the second method to achieve a strong variation in magnitudes of different microphone signals for omnidirectional microphones is shown in Figure 10. ONIDIRECTIONAL, ACCORDING TO FIGURE 10 Figure 10 shows a system 1000 comprising an apparatus, for example, apparatus 100, according to Figure 1, for deriving directional information 101 from a plurality of microphone signals or components of a microphone signal. In addition, the system 1000 comprises a first omni-directional microphone 10012 for deriving a first microphone signal 103: from the plurality of microphone signals from the apparatus 100. In addition, the system 1000 comprises a second omni-directional microphone 10012 for deriving a second microphone signal. 1032 of the plurality of microphone signals from apparatus 100. In addition, system 1000 comprises a shading object 1005 (also denoted as scatter object 1005) placed between the first omni directional microphone 10012 and the second omni directional microphone 10012 to form response patterns efficient of the first omni directional microphone 10012 and the second omni directional microphone 10012, so that an efficient response pattern formed from the first omni directional microphone 10012 comprises a first visual direction of efficient microphone 10032 and an effected pattern formed from the second omni directional microphone 10012 comprises a second direction efficient microphone look te 10032. In other words, by using the shading object 1005 between the omni-directional microphones lOOlj, 10012, a directional behavior of the omni-directional microphones lOOlj, 10012 can be achieved, so that measurable magnitude differences between the omni-directional microphones 10012, 10012, even with a small distance between the two omni-directional microphones 10012, 10012 can be reached. Additional extensions, optional to the 1000 system are given in Figure 4 to Figure 6, in which different geometric objects are placed in the middle of a conventional arrangement of four microphones (omnidirectional). Figure 4 shows an illustration of a microphone configuration that employs an object 1005 to cause scatter and shading effects. In this example, in Figure 4, the object is a rigid cylinder. The microphone positions of four (omni-directional) microphones 1001: to 10014 are marked by black circuits. Figure 5 shows an illustration of a microphone configuration similar to Figure 4, but which employs a different microphone placement (on a rigid surface of a rigid cylinder). The microphone positions of the four (omni-directional) microphones 1001: to 10014 are marked by black circuits. In the example shown in Figure 5, the shading object 1005 comprises the rigid cylinder and the rigid surface. Figure 6 shows an illustration of a microphone configuration that employs an additional object 1005 to cause scattering and shading effects. In this example, object 1005 is a rigid hemisphere (with a rigid surface). The microphone positions of the four (omni-directional) microphones 10011 to IOOI4 are marked by black circuits.
In addition, Figure 7 presents an example for a three-dimensional DOA estimate (a derivation of three-dimensional directional information) using six 1001 (omni-directional) microphones! to 10016 distributed over a rigid sphere. In other words, Figure 6 presents an illustration of a 3D microphone configuration that employs an object 1005 to cause shading effects. In this example, the object is a rigid sphere. The microphone positions of 10000 (omni-directional) microphones at 10016 are marked by black circuits.
From the magnitude differences between the different microphone signals generated by the different microphones presented in Figures 2 to 7 and 9 to 10, the achievements compute the directional information that follows the approach explained with the apparatus 100, according to Figure 1.
According to additional achievements, the first 901 directional microphone! or the first omni directional microphone 100i and the second directional microphone 9012 or the second omni directional microphone 10012 can be arranged so that a sum of a first item of direction information, which is a vector that points in the first visual direction of an efficient microphone 903i, 1003 !, and a second item of direction information, which is a vector that points to the second visual direction of an efficient microphone 9032, 10032, equals 0 within a tolerance range of +/- 5%, + / - 10%, +/- 20% or +/- 30% of the first direction information item or the second direction information item.
In other words, equation (6) can apply to the microphones of the 900, 1000 systems, where b2 is a direction information item of the i-nth microphone, which is a unit vector that points in the visual direction of the efficient microphone of the nth microphone.
In the following, workarounds for using the magnitude information from the microphone signals to estimate the directional parameter will be described. 5.4 ALTERNATIVE SOLUTIONS 5.4.1 CORRELATION-BASED APPROACH
An alternative approach to explore only the magnitude information of microphone signals for directional parameter estimation is proposed in this section. It is based on the correlations between magnitude spectra of microphone signals and magnitude spectra determined a priori, corresponding obtained from models or measurements. Let Si (k, n) = | (k, n) | K denotes the magnitude or power spectrum of the i-th microphone signal. Then, we defined the response of the measured magnitude arrangement S (k, n) of the N microphones as

The collectors of arrangement of corresponding magnitude of the microphone arrangement are denoted by SM (<p, k, n). Magnetic array collectors obviously depend on the DOA of the sound ç whether directional microphones with different visual direction or scattering / shading with objects within the array are used. The influence of the DOA of the sound on the array collectors depends on the actual configuration of the array, and is influenced by the directional patterns of the microphones and / or dispersion object included in the microphone configuration. The arrangement collectors can be determined from the arrangement measurements, where the sound is reproduced from different directions. Alternatively, physical models can be applied. The effect of a cylindrical disperser of the sound pressure distribution on its surface is, for example, described in H. Teutsch and W. Kellermann, Acoustic source detection and localization based on wavefield decomposition using circular microphone arrays, J. Acoust. Soc. Am., 5 (120), 2006.
To determine the desired DOA estimate of the sound, the magnitude array response and the magnitude array collectors are correlated. The estimated DOA corresponds to the maximum of the normalized correlation, according to

Although we have presented only the 2D case for the DOA estimate here, it is obvious that the 3D DOA estimate that includes azimuth and elevation can be performed in a similar way. 5.4.2 APPROACH ON THE BASIS OF NOISE
An alternative approach to explore only the magnitude information of microphone signals for directional parameter estimation is proposed in this section. It is based on the well-known root MUSIC algorithm (R. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Transactions on Antennas and Propagation, 34 (3): 276-280, 1986), with the exception that, in the example presented, only the magnitude information is processed.
Let S (k, n) be the response of the measured magnitude arrangement, as defined in (19). Next, ken dependencies are omitted, since all steps are performed separately for each time frequency interval. The correlation matrix R can be computed with
where (-) H denotes the conjugated transposition and E {•} is the expectation operator. The expectation is generally approximated by a process of temporal and / or spectral measurement in practical application. The eigenvalue decomposition of R can be written as
where ÀK..N are the eigenvalues and N is the number of microphones or measurement positions. Now, when a strong plane wave arrives at the microphone array, a relatively large eigenvalue À is obtained, while all other eigenvalues are close to zero. The eigenvectors, which correspond to the last eigenvalues, form the so-called noise subspace Qn. This matrix is orthogonal to the so-called signal subspace Qs, which contains the proper vector (s) corresponding to the highest proper value (s). The so-called MUSIC spectrum can be computed with
where the orientation vector s (<p) for the investigated orientation direction cp is obtained from the collectors of the maximum SM arrangement when the orientation direction <p corresponds to the actual DOA of the sound. Thus, the DOA of the sound (pD0A can be determined by obtaining the ç for which P (cp) becomes maximum, that is,
In the following, an example of a detailed embodiment of the present invention for a wide-range steering estimation method / apparatus using combined pressure and energy gradients from an optimized microphone array will be described. 5.5 EXAMPLE OF A DIRECTION ESTIMATE THAT USES COMBINED PRESSURE AND ENERGY GRADIENTS 5.5.1 INTRODUCTION
The analysis of the direction of arrival of the sound is used in several techniques of audio reproduction to provide the parametric representation of spatial sound of audio file of multiple channels or of multiple microphone signals (F. Baumgarte and C. Faller, "Binaural Cue Coding - part I: Psychoacoustic fundamentals and design principles, "IEEE Trans. Speech Audio Process., Vol. 11, pp. 509-519, November 2003; M. Goodwin and JM. Jot," Analysis and synthesis for Universal Spatial Audio Coding , "in Proc. AES 121st Convention, San Francisco, CA, USA, 2006; V. Pulkki," Spatial sound reproduction with Directional Audio Coding, "J. Audio Eng. Soc, vol. 55, pp. 503-516, June 2007; and C. Faller, "Microphone front-ends for spatial audio coders," in Proc. AES 125th Convention, San Francisco, CA, USA, 2008). In addition to the reproduction of spatial sound, the analyzed direction can also be used in these applications, such as location of origin and beam formation (M. Kallinger, G. Del Galdo, F. Kuech, D. Mahne, and R.
Schultz-Amling, "Spatial filtering using Directional Audio Coding parameters," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE Computer Society, pp. 217-220, 2009 and O. Thiergart, R. Schultz-Amling, G. Del Galdo, D. Mahne, and F. Kuech, "Localization of sound sources in reverberant environments based on Directional Audio Coding parameters," in Proc. AES 127th Convention, New York, NY, USA, 2009). In this example, the analysis of the direction is discussed from the point of view of a processing technique, Directional Audio Coding (DirAC), to record and reproduce the spatial sound in several applications (V. Pulkki, "Spatial sound reproduction with Directional Audio Coding, "J. Audio Eng. Soc, vol. 55, pp. 503-516, June 2007).
In general, the direction analysis in DirAC is based on the measurement of the 3D sound intensity vector, which needs information about sound pressure and particle speed at a single point in the sound field. DirAC is therefore used with B-format signals in the form of an omnidirectional signal and three dipole signals directed along Cartesian coordinates. B-format signals can be derived from an array of closely spaced or coincident microphones (J. Merimaa, "Applications of a 3-D microphone array," in Proc. AES 112th Convention, Munich, Germany, 2002 and MA Gerzon, " The design of precisely coincident microphone arrays for stereo and surround sound, "in Proc. AES 50th Convention, 1975). A solution for the consumer level with four omnidirectional microphones placed in a square arrangement is used here. Unfortunately, dipole signals, which are derived as pressure gradients from this arrangement, suffer from high frequency spatial aliasing. Consequently, the direction is erroneously estimated above the frequency of spatial aliasing, which can be derived from the array spacing.
In this example, a method for extending the reliable direction estimate above the frequency of spatial aliasing is presented with real omnidirectional microphones. The method uses the fact that a microphone alone shadows the incoming sound with relatively short wavelengths at high frequencies. This shading produces measurable inter-microphone level differences for the microphones placed in the array, depending on the direction of arrival. This makes it possible to approximate the sound intensity vector by computing an energy gradient between the microphone signals and, in addition, to estimate the direction of arrival based on this. In addition, the size of the microphone determines the threshold frequency, above which the level differences are sufficient to use energy gradients viable. Shading takes effect at lower frequencies with a larger size. The example also discusses how to optimize spacing in the array, depending on the microphone diaphragm size, to match the estimation methods that use both pressure and energy gradients.
The example is organized as follows. Section 5.5.2 comments on the direction estimate using energy analysis with B-format signals, the creation of which with a square arrangement of omnidirectional microphones is described in Section 5.5.3. In Section 5.5.4, the method for estimating direction using energy gradients is presented with relatively large microphones in the square arrangement. Section 5.5.5 proposes a method for optimizing microphone spacing in the array. Method assessments are presented in Section 5.5.6. Finally, the conclusions are given in Section 5.5.7. 5.5.2 DIRECTION ESTIMATE IN ENERGY ANALYSIS
The direction estimate with the energy analysis is based on the sound intensity vector, which represents the direction and magnitude of the liquid flow of the sound energy. For the analysis, sound pressure p and particle speed u can be estimated at a point in the sound field using the omnidirectional signal W and the dipole signals (X, Y and Z for Cartesian directions) of B-format, respectively. To harmonize the sound field, the frequency and time analysis, such as the short time Fourier transform (STFT) with a 20 ms time window, is applied to the B-format signals in the DirAC implementation presented here. Subsequently, the instantaneous active sound intensity
is computed in each frequency and time clipping of the B-format signals transformed by STFT for which the dipoles are expressed as X (t, f) = [X (t, f) Y (t, f) z (t, f )] T. Here, tef are time and frequency, respectively, and Zo is the acoustic impedance of the air. In addition, Zo = Poc, where p0 is the average density of the air, and c is the speed of sound. The direction of arrival of the sound, such as the azimuth θ and elevation angles (f, is defined as the opposite of the direction of the sound intensity vector. 5.5.3 MICROPHONE ARRANGEMENT FOR DERIVATING B-FORMAT SIGNS IN THE HORIZONTAL PLAN Figure 11 features an arrangement of four omnidirectional microphones with d spacing between opposing microphones.
An arrangement, which is composed of four narrowly spaced omnidirectional microphones and shown in Figure 11, was used to derive the horizontal B-format signals (W, X and Y) to estimate the azimuth angle θ of the direction in DirAC (M. Kallinger, G. Del Galdo, F. Kuech, D. Mahne, and R. Schultz-Amling, "Spatial filtering using Directional Audio Coding parameters," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE Computer Society, pp. 217-220, 2009 and 0. Thiergart, R. Schultz-Amling, G. Del Galdo, D. Mahne, and F. Kuech, "Localization of sound sources in reverberant environments based on Directional Audio Coding parameters," in Proc. AES 127th Convention, New York, NY, USA, 2009). Microphones of relatively small sizes are typically positioned a few centimeters (for example, 2 cm) from each other. With this arrangement, the omnidirectional signal W can be produced as an average over the microphone signals, and the dipole X and Y signals are derived as pressure gradients by subtracting the signals from the opposite microphones as

Here, Px, Pzr P3 θ P4 are the microphone signals transformed by STFT, and A (f) is a constant frequency dependent equalization. In addition, A (f) = -j (cN) / (2nfdfs), where j is the imaginary unit, N is the number of STFT frequency intervals or cuts, d is the distance between opposing microphones, and fs is the sampling rate.
As already mentioned, spatial aliasing takes effect on pressure gradients and begins to distort dipole signals, when the wavelength in half of the incoming sound is less than the distance between the opposing microphones. The theoretical space aliasing frequency fsa to define the upper frequency limit for a valid dipole signal is therefore computed as
above which the direction is erroneously estimated. 5.5.4 DIRECTION ESTIMATE USING ENERGY GRADIENTS
Since the spatial aliasing and directivity of the microphone by shading inhibit the use of pressure gradients at high frequencies, a method for extending the frequency variation for reliable direction estimation is desired. Here, an arrangement of four omnidirectional microphones, arranged so that their directions on the axis point in outward and opposite directions, is employed in a proposed method for estimating broadband direction. Figure 12 shows this arrangement, in which a different amount of sound energy from the plane wave is captured with different microphones.
The four omnidirectional microphones 100i at 10014 of the arrangement shown in Figure 12 are mounted on the end of a cylinder. The directions on axis 10031 to 10034 of the microphones point outward from the center of the array. This arrangement is used to estimate the direction of arrival of a sound wave that uses energy gradients.
Energy differences are assumed here to make it possible to estimate the 2D sound intensity vector, when the axial components of x and y are approximated by subtracting the power spectra of the opposing microphones, as

The azimuth angle θ for the arrival plane wave can also be obtained from the approximations of intensity x and y. To make the computation described above feasible, inter-microphone level differences large enough to be measured with an acceptable signal-to-noise ratio are desired. Thus, microphones having relatively wide diaphragms are employed in the arrangement.
In some cases, energy gradients cannot be used to estimate direction at lower frequencies, where microphones do not shade the incoming sound wave with relatively long wavelengths. Thus, the information of the direction of sound at high frequencies can be combined with the information of direction at low frequencies obtained with pressure gradients. The frequency of intersection between the techniques, clearly, is the frequency of spatial aliasing fsa, according to Equation (27). 5.5.5 SPACE OPTIMIZATION OF THE ARRANGEMENT OF
As stated earlier, the size of the diaphragm determines frequencies at which shading by the microphone is efficient for computing energy gradients. In order to match the spatial aliasing frequency fsa with the limit frequency fiira to use the energy gradients, the microphones must be positioned at an appropriate distance from each other in the array. Thus, the definition of the spacing between microphones with a given diaphragm size is discussed in this section.
The frequency-dependent directivity index for an omnidirectional microphone can be measured in decibels, such as
where ΔL is the proportion of the seizure energy on the axis related to the total seizure energy integrated across all directions (J. Eargle, "The microphone book," Focal Press, Boston, USA, 2001). In addition, the directivity index at each frequency depends on a proportion value
between the diaphragm circumference and the wavelength. Here, r is the radius of the diaphragm and À is the wavelength. In addition, À = with film. The dependence of the directivity index Dl as a function of the ka proportion value was presented by simulation in J. Eargle, "The microphone book," Focal Press, Boston, USA, 2001 to be a function that increases monotonically, as shown in Figure 13.
The directivity index Dl in decibels, shown in Figure 13, is adapted from J. Eargle, "The microphone book," Focal Press, Boston, USA, 2001. The theoretical indices are plotted as a function of ka, which represents the circumference diaphragm of the omnidirectional microphone divided by the wavelength.
This dependency is used here to define the proportion value ka for a desired directivity index Dl. In this example, Dl is defined to have 2.8 dB, producing the ka value of 1. The optimized spacing of microphones with a given index of directivity can now be defined by using Equation (27) and Equation (30), when the frequency of spatial aliasing was equal to the final limit frequency. The optimized spacing is therefore computed as
5.5.6 EVALUATION OF DIRECTION ESTIMATES
The direction estimation methods, discussed in this example, are now evaluated in DirAC analysis with anechoic measurements and simulations. Rather than measuring four microphones in a square at the same time, impulse responses were measured from multiple directions with a single omni-directional microphone with a relatively large diaphragm. The measured responses were subsequently used to estimate the impulse responses of four omnidirectional microphones placed in a square, as shown in Figure 12. Consequently, the energy gradients depended mainly on the microphone's diaphragm size, and spacing optimization can be, therefore, studied, as described in Section 5.5.5. Obviously, four microphones in the array would effectively provide more shading of the incoming sound wave and the direction estimate would be improved slightly for the case of a single microphone. The assessments described above are applied here with two different microphones having different diaphragm sizes.
Impulse responses were measured at 5o intervals, using a mobile speaker (Genelec 8030A) at a distance of 1.6 m in an anechoic chamber. Measurements at different angles were conducted using a scanning sine at 20-20000 Hz and 1 s in length. The A-weighted sound pressure level was 75 dB. The measurements were conducted using AKG 40AI omnidirectional microphones of AKG type CK 62-UL with 1.27 cm (0.5 inch) and 2.1 cm (0.8 inch) diaphragms, respectively.
In the simulations, the Dl directivity index was set to be 2.8 dB, which corresponds to the ka ratio with a value of 1 in Figure 13. According to the optimized microphone spacing in Equation (31), the opposite microphones were simulated at a distance of 2 cm and 3.3 cm with GRAS and AKG microphones, respectively. These spacings result in the spatial aliasing frequencies of 8575 Hz and 5197 Hz.
Figure 14 and Figure 15 show directional patterns with G.R.A.S and AKG microphones: 14a) energy from a single microphone, 14b) pressure gradient between two microphones, and 14c) energy gradient between two microphones. Figure 14 shows logarithmic directional patterns based on G.R.A.S. The patterns are normalized and plotted in third octave bands with the central frequency of 8 kHz (curves with reference number 1401), 10 kHz (curves with reference number 1403), 12.5 kHz (curves with reference number reference 1405) and 16 kHz (curves with reference number 1407). The standard for an ideal dipole with deviation of ± 1 dB is denoted with an area 1409 in 14b) and 14c).
Figure 15 shows directional logarithmic patterns with the AKG microphone. The patterns are normalized and plotted in the third octave range with the central frequencies of 5 kHz (curves with reference number 1501), 8 kHz (curves with reference number 1503), 12.5 kHz (curves with reference number reference 1505) and 16 kHz (curves with reference number 1507). The standard for an ideal dipole with deviation of ± 1 dB is denoted with an area 1509 in 15b) and 15d).
The normalized patterns are plotted in some third octave bands with the central frequencies starting close to the theoretical spatial aliasing frequencies of 8575 Hz (G.R.A.S) and 5197 Hz (AKG). It should be noted that different center frequencies are used with G.R.A.S and AKG microphones. In addition, the directional pattern for an ideal dipole with deviation of ± 1 dB is denoted as areas 1409, 1509 in the pulls of pressure and energy gradients. The patterns in Figure 14a) and Figure 15a) reveal that the individual omnidirectional microphone has significant directivity at high frequencies, due to shading. With the G.R.A.S microphone and 2 cm spacing in the array, the dipole derived as the pressure gradient dispersed as a function of the frequency in Figure 14b). The energy gradient produces dipole patterns, but slightly narrower than ideal at 12.5 kHz and 16 kHz in Figure 14c). With the AKG microphone and 3.3 cm spacing in the array, the directional pattern of the dispersed and distorted pressure gradient at 8 kHz, 12.5 kHz and 16 kHz, although with the energy gradient, dipole patterns decrease as a function of frequency, but which compare, however, to the ideal dipole.
Figure 16 presents the results of the direction analysis as mean square root errors (RMSE) along the frequency, when the measured responses of microphones G.R.A.S and AKG are used to simulate the array of microphones in 16a) and 16b), respectively.
In Figure 16, the direction was estimated using arrangements of four omnidirectional microphones, which were modeled using the pulse responses measured from real microphones.
The direction analyzes were performed by involving the impulse responses of the microphones at 0 °, 5 °, 10 °, 15 °, 20 °, 25 °, 30 °, 35 °, 40 ° and 45 °, alternatively with a noise sample blank, and estimate the direction within the 20 ms STFT windows in the DirAC analysis. Visual inspection of the results reveals that the direction is estimated precisely up to the frequencies of 10 kHz in 16a) and 6.5 kHz in 16b), using pressure gradients, and above those frequencies using energy gradients. The frequencies mentioned above are, however, slightly higher than the theoretical spatial aliasing frequencies of 8575 Hz and 5197 Hz with the optimized microphone spacing of 2 cm and 3.3 cm, respectively. In addition, the frequency variations for the valid direction estimate with pressure and energy gradients exist at 8 kHz to 10 kHz with GRAS microphone at 16a) and at 3 kHz to 6.5 kHz with AKG microphone at 16b) . The optimization of microphone spacing with certain values aims to provide a good estimate in these cases. 5.5.7 CONCLUSION
This example presents a method / device for analyzing the direction of arrival of sound in the wide audio frequency range, when pressure and energy gradients between omnidirectional microphones are computed at low and high frequencies, respectively, and used to estimate the vectors of sound intensity. The method / apparatus was employed with an arrangement of four omnidirectional microphones that face opposite directions with relatively large diaphragm sizes, which provided for measurable inter-microphone level differences to compute energy gradients at high frequencies.
It was revealed that the presented method / device provides the reliable direction estimate in wide audio frequency variation, while the conventional method / device that employs only pressure gradients in energy analysis of the sound field that suffered from spatial aliasing and produces, as well , estimation of highly erroneous direction at high frequencies.
To summarize, the example presented the method / device to estimate the sound direction by computing the sound intensity of pressure and energy gradients from the frequency of narrowly spaced omnidirectional microphones, in a dependent manner. In other words, the achievements provide an apparatus and / or a method that is configured to estimate directional information of a pressure gradient and an energy of the frequency of narrowly spaced omnidirectional microphones in a dependent manner. Microphones with relatively large diaphragms that cause shading to the sound wave are used here to provide inter-microphone level differences large enough to compute viable energy gradients at high frequencies. The example was evaluated in the direction analysis of the spatial processing technique, directional audio coding (DirAC). It has been shown that the method / apparatus provides reliable direction estimation information in the full audio frequency range, while traditional methods that employ only pressure gradients produce a highly erroneous estimate at high frequencies.
From this example, it can be seen that, in an additional realization, a device combiner, according to this realization, is configured to derive directional information based on the magnitude values and independent of the phases of the microphone signal or the components of the microphone signal on a first frequency variation (for example, above the spatial aliasing limit). In addition, the combiner can be configured to derive directional information depending on the phases of the microphone signals or the components of the microphone signal in a second frequency variation (for example, below the spatial aliasing limit). In other words, the achievements of the present invention can be configured to derive the selective frequency of directional information, so that, in a first frequency variation, the directional information is based only on the magnitude of the microphone signals or on the components of the signal. microphone and, in a second frequency variation, the directional information is also based on the phases of the microphone signals or the components of the microphone signal. 6. SUMMARY
To summarize, the achievements of the present invention estimate directional parameters of a sound field when considering (only) the magnitudes of microphone spectra. This is especially useful in practice, if the microphone phase information of the microphone signals is ambiguous, that is, when spatial aliasing effects occur. In order to be able to extract the desired directional information, the embodiments of the present invention (for example, the 900 system) use suitable configurations of directional microphones, which have different visual directions. Alternatively (for example, in the 1000 system), objects can be included in the microphone settings that cause direction-dependent scattering and shading effects. On certain commercial microphones (for example, large diaphragm microphones), the microphone capsules are mounted in relatively large housings. The resulting shading / scattering effect may already be sufficient to employ the concept of the present invention. According to the additional embodiments, the magnitude-based parameter estimation performed by the embodiments of the present invention can also be applied in combination with traditional estimation methods, which also consider the phase information of the microphone signals. spatial parameter estimation through directional magnitude variations.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or an aspect of a method step. Similarly, the aspects described in the context of a method step also represent a description of a corresponding block or item or an aspect of a corresponding apparatus. Some or all of the steps in the method can be performed by (or using) a hardware device, such as a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some or more of the most important steps of the method can be performed by this device.
Depending on certain implementation requirements, the realizations of the invention can be implemented in hardware or in software. The implementation can be carried out using a digital storage medium, for example, a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, with readable control signals electronically stored in it, which cooperate (or are capable of cooperating) with a programmable computer system, so that the respective method is carried out. Therefore, the digital storage medium can be computer readable.
Some embodiments, according to the invention, comprise a data loader having electronically readable control signals, which are capable of cooperating with a programmable computer system, so that one of the methods described herein is performed.
In general, the embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operated to perform one of the methods when the computer program product runs on a computer. The program code can, for example, be stored in a machine-readable loader.
Other achievements include the computer program to perform one of the methods described here, stored in a machine-readable charger.
In other words, an embodiment of the method of the present invention is, therefore, a computer program having a program code to perform one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the methods of the present invention is, therefore, a data loader (either a digital storage medium or a computer-readable medium) comprising, written in itself, the computer program for carrying out one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and / or non-temporary.
A further embodiment of the method of the present invention is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the signal sequence can, for example, be configured to be transferred via a data communication connection, for example, via the Internet.
A further embodiment comprises a processing means, for example a computer or a programmable logic device, configured or adapted to carry out one of the methods described herein.
A further embodiment comprises a computer having the computer program installed on it to perform one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or system configured to transfer (for example, electronically or optionally) a computer program to perform one of the methods described herein to a receiver. The receiver can, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example, a programmable field gate matrix) can be used to perform some or all of the functionality of the methods described here. in some embodiments, a programmable field gate array can cooperate with a microprocessor in order to perform one of the methods described here. In general, the methods are preferably performed by any hardware device.
The embodiments described above are merely illustrative for the principles of the present invention. It should be understood that modifications and variations of the provisions and details described herein will be apparent to those skilled in the art.
Therefore, it is intended to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the 5 achievements here.

权利要求:
Claims (18)
[0001]
1. APPLIANCE (100) FOR DERIVING DIRECTIONAL INFORMATION (101, d (k, n)) FROM A MICROPHONE SIGNAL PLURALITY (1031 to 103N, P1 to PN) OR FROM A COMPONENT PLURALITY (Pi (k, n) ) OF A MICROPHONE SIGN (103i, Pi), IN WHICH DIFFERENT VISUAL MICROPHONE DIRECTIONS ARE ASSOCIATED WITH MICROPHONE SIGNALS (1031 to 103N, P1 to PN) OR COMPONENTS (Pi (k, n)), with the device ( 100) characterized by comprising: a combiner (105) configured to obtain a magnitude value of a microphone signal (Pi) or a component (Pi (k, n)) of the microphone signal (Pi), and to combine items of direction information (b1 to bN) that describes the effective microphone visual directions, so that a direction information item (bi) that describes a given effective microphone visual direction is weighted depending on the magnitude value of the microphone signal (Pi) or the component (Pi (k, n)) of the microphone signal (Pi), associated with the given effective microphone visual direction, pa to derive directional information (101, d (k, n)).
[0002]
2. APPLIANCE (100) according to claim 1, characterized in that an effective microphone visual direction associated with a microphone signal (Pi) describes the direction, where a microphone, of which the microphone signal (Pi), is derived, has its maximum response.
[0003]
3. APPLIANCE (100), according to one of the preceding claims, characterized in that the direction information item (bi) that describes the given effective microphone visual direction is a vector that points in the given effective microphone visual direction.
[0004]
Apparatus (100) according to one of the preceding claims, characterized in that the combiner (105) is configured to obtain the magnitude value, so that the magnitude value describes a magnitude of a spectral coefficient (Pi (k , n)) which represents a spectral subregion (k) of the microphone signal (Pi).
[0005]
Apparatus (100) according to one of the preceding claims, characterized in that the combiner (105) is configured to derive directional information (101, d (k, n)) based on a time frequency representation of the microphone (P1 to PN) or component signals.
[0006]
6. Apparatus (100), according to one of the preceding claims, characterized in that the combiner (105) is configured to combine the direction information items (b1 to bN) weighted depending on the magnitude values that are associated with a determined time frequency cutout (k, n), in order to derive directional information (d (k, n)) for the given time frequency cutout (k, n).
[0007]
7. Apparatus (100) according to one of the preceding claims, characterized in that the combiner (105) is configured to combine, for a plurality of different time frequency cuts, the same direction information items (b1 to bN ), which are weighted differently depending on the magnitude values associated with the different time frequency cuts.
[0008]
Apparatus according to one of the preceding claims, characterized in that a first effective microphone visual direction is associated with a first microphone signal from the plurality of microphone signals; wherein a second effective visual microphone direction is associated with a second microphone signal from a plurality of microphone signals; wherein the first effective microphone visual direction is different from the second effective microphone visual direction; and wherein the combiner is configured to obtain a first magnitude value of the first microphone signal or a component of the first microphone signal, to obtain a second magnitude value of the second microphone signal or a component of the second microphone signal, and to combine a first direction information item that describes the first effective microphone visual direction and a second direction information item that describes the second effective microphone visual direction, so that the first direction information item is weighted by the first magnitude value and the second item of direction information is weighted by the second magnitude value, to derive directional information.
[0009]
Apparatus according to one of the preceding claims, characterized in that the combiner is configured to obtain a magnitude squared value based on the magnitude value, the magnitude squared value describing a microphone signal strength (Pi) or component (Pi (k, n)) of the microphone signal, and where the combiner is configured to match the direction information items (b1 to bN), so that a direction information item (bi) is weighted depending on the magnitude squared value of the microphone signal (Pi) or the component (Pi (k, n)) of the microphone signal (Pi) associated with the given effective microphone visual direction.
[0010]
10. APPLIANCE (100), according to one of the preceding claims, characterized in that the combiner (105) is configured to derive directional information (d (k, n)), according to the following equation:
[0011]
11. Apparatus according to claim 10, characterized in that K> 0.
[0012]
12. APPLIANCE, according to one of the preceding claims, characterized in that the combiner is configured to derive directional information (d (k, n)) based on the magnitude values and independent of the phases of the microphone signals (P1 to PN ) or the components (Pi (k, n)) of the microphone signal (Pi) in a first frequency variation; and where the combiner is further configured to derive directional information depending on the phases of the microphone signals (P1 to PN) or the components (Pi (k, n)) of the microphone signal (Pi) in a second frequency variation .
[0013]
13. APPLIANCE, according to one of the preceding claims, characterized in that the combiner is configured so that the direction information item (bi) is weighted only depending on the magnitude value.
[0014]
Apparatus (100) according to one of the preceding claims, characterized in that the combiner (105) is configured to linearly combine the direction information items (b1 to bN).
[0015]
15. SYSTEM (900), characterized in that it comprises: an apparatus (100) according to one of the preceding claims, a first directional microphone (9011) having a first effective microphone visual direction (9031) for deriving a first microphone signal (1031) of the plurality of microphone signals, the first microphone signal (1031) being associated with a first effective microphone visual direction (9031); and a second directional microphone (9012) having a second effective microphone visual direction (9032) to derive a second microphone signal (1032) from the plurality of microphone signals, the second microphone signal (1032) being associated with the second visual direction effective microphone (9032); and where the first visual direction (9031) is different from the second visual direction (9032).
[0016]
16. SYSTEM (1000), characterized in that it comprises: an apparatus according to one of claims 1 to 14, a first omni-directional microphone (10011) for deriving a first microphone signal (1031) from the plurality of microphone signals; a second omnidirectional microphone (10012) for deriving a second microphone signal (1032); and a shading object (1005) placed between the first omni directional microphone (10011) and the second omni directional microphone (10012) to form effective response patterns from the first omni directional microphone (10011) and the second omni directional microphone (10012), so that an effective response pattern formed from the first omni directional microphone (10011) comprises a first visual direction of effective microphone (10031) and an effective response pattern formed from the second omni directional microphone (10012) comprises a second visual direction of effective microphone (10032), which is different from the first effective microphone visual direction (10031).
[0017]
17. SYSTEM, according to one of claims 15 or 16, characterized in that the directional microphones (9011, 9012) or the omnidirectional microphones (10011, 10012) are arranged so that a sum of the direction information items, which are vectors pointing in the effective microphone visual directions (9031, 9032, 10031, 10032), is equal to zero within a tolerance variation of ± 30% of the norm for one of the direction information items.
[0018]
18. METHOD (800) FOR DERIVING DIRECTIONAL INFORMATION FROM A MICROPHONE SIGNAL PLURALITY OR A MICROPHONE SIGNAL COMPONENT PLURALITY, IN WHICH THE EFFECTIVE DIFFERENT VISUAL MICROPHONE DIRECTIONS ARE ASSOCIATED WITH THE MICROPHONE SIGNS, OR COMPONENTS the method characterized by comprising: obtaining (801) a magnitude value of the microphone signal or a component of the microphone signal; and combination (803) of direction information items that describe the effective microphone visual directions, so that a direction information item describing a given effective microphone visual direction is weighted depending on the magnitude value of the microphone signal or the microphone signal component associated with the given effective microphone visual direction, to derive directional information.

类似技术:

公开号 | 公开日 | 专利标题

BR112013010258B1|2020-12-29|apparatus and method for deriving directional information and systems

TWI530201B|2016-04-11|Sound acquisition via the extraction of geometrical information from direction of arrival estimates

JP5814476B2|2015-11-17|Microphone positioning apparatus and method based on spatial power density

TW201426738A|2014-07-01|Apparatus and method for generating a plurality of parametric audio streams and apparatus and method for generating a plurality of loudspeaker signals

Del Galdo et al.2011|Generating virtual microphone signals using geometrical information gathered by distributed arrays

Kuech et al.2008|Directional audio coding using planar microphone arrays

Hioka et al.2009|Multiple-speech-source localization using advanced histogram mapping method

Vryzas et al.2015|Embedding sound localization and spatial audio interaction through coincident microphones arrays

Ahonen et al.2011|Broadband direction estimation method utilizing combined pressure and energy gradients from optimized microphone array

Dias et al.2019|Time-deconvolutive CNMF for multichannel blind source separation

BR112014013335B1|2021-11-23|APPARATUS AND METHOD FOR MICROPHONE POSITIONING BASED ON A SPACE POWER DENSITY

Pänkäläinen2018|Spatial analysis of sound field for parametric sound reproduction with sparse microphone arrays

同族专利:

公开号 | 公开日

KR20130127987A|2013-11-25|

KR101510576B1|2015-04-15|

RU2555188C2|2015-07-10|

BR112013010258A2|2016-09-13|

US9462378B2|2016-10-04|

AU2011322560B2|2015-01-22|

CN103329567B|2016-09-07|

PL2628316T3|2015-05-29|

HK1188063A1|2014-04-17|

JP2013545382A|2013-12-19|

EP2628316A1|2013-08-21|

AU2011322560A1|2013-05-30|

RU2013124400A|2014-12-10|

WO2012055940A1|2012-05-03|

CA2815738C|2016-06-21|

ES2526785T3|2015-01-15|

CA2815738A1|2012-05-03|

EP2448289A1|2012-05-02|

AR085199A1|2013-09-18|

JP5657127B2|2015-01-21|

EP2628316B1|2014-11-05|

MX2013004686A|2013-05-20|

CN103329567A|2013-09-25|

TW201230822A|2012-07-16|

TWI556654B|2016-11-01|

US20130230187A1|2013-09-05|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

GB1512514A|1974-07-12|1978-06-01|Nat Res Dev|Microphone assemblies|

US4752961A|1985-09-23|1988-06-21|Northern Telecom Limited|Microphone arrangement|

RU2048678C1|1993-12-29|1995-11-20|Научно-исследовательский институт специального машиностроения МГТУ им.Н.Э.Баумана|Direction finder of acoustic wave sources|

US5581620A|1994-04-21|1996-12-03|Brown University Research Foundation|Methods and apparatus for adaptive beamforming|

JP3599653B2|2000-09-06|2004-12-08|日本電信電話株式会社|Sound pickup device, sound pickup / sound source separation device and sound pickup method, sound pickup / sound source separation method, sound pickup program, recording medium recording sound pickup / sound source separation program|

US8204247B2|2003-01-10|2012-06-19|Mh Acoustics, Llc|Position-independent microphone system|

KR100493172B1|2003-03-06|2005-06-02|삼성전자주식회사|Microphone array structure, method and apparatus for beamforming with constant directivity and method and apparatus for estimating direction of arrival, employing the same|

JP4248294B2|2003-03-17|2009-04-02|日東紡音響エンジニアリング株式会社|Beamforming with microphone using indefinite term|

DE10313331B4|2003-03-25|2005-06-16|Siemens Audiologische Technik Gmbh|Method for determining an incident direction of a signal of an acoustic signal source and apparatus for carrying out the method|

GB0405455D0|2004-03-11|2004-04-21|Mitel Networks Corp|High precision beamsteerer based on fixed beamforming approach beampatterns|

EP1795041A4|2004-09-07|2009-08-12|Sensear Pty Ltd|Apparatus and method for sound enhancement|

US7619563B2|2005-08-26|2009-11-17|Step Communications Corporation|Beam former using phase difference enhancement|

WO2007106399A2|2006-03-10|2007-09-20|Mh Acoustics, Llc|Noise-reducing directional microphone array|

AU2007323521B2|2006-11-24|2011-02-03|Sonova Ag|Signal processing using spatial filter|

US7986794B2|2007-01-11|2011-07-26|Fortemedia, Inc.|Small array microphone apparatus and beam forming method thereof|

US8098842B2|2007-03-29|2012-01-17|Microsoft Corp.|Enhanced beamforming for arrays of directional microphones|

US8553903B2|2007-06-27|2013-10-08|Alcatel Lucent|Sound-direction detector having a miniature sensor|

JP5294603B2|2007-10-03|2013-09-18|日本電信電話株式会社|Acoustic signal estimation device, acoustic signal synthesis device, acoustic signal estimation synthesis device, acoustic signal estimation method, acoustic signal synthesis method, acoustic signal estimation synthesis method, program using these methods, and recording medium|

DE102008004674A1|2007-12-17|2009-06-18|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Signal recording with variable directional characteristics|

JP5156934B2|2008-03-07|2013-03-06|学校法人日本大学|Acoustic measuring device|

DE102008029352A1|2008-06-20|2009-12-31|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus, method and computer program for locating a sound source|US9264524B2|2012-08-03|2016-02-16|The Penn State Research Foundation|Microphone array transducer for acoustic musical instrument|

CN103124386A|2012-12-26|2013-05-29|山东共达电声股份有限公司|De-noising, echo-eliminating and acute directional microphone for long-distance speech|

US9565493B2|2015-04-30|2017-02-07|Shure Acquisition Holdings, Inc.|Array microphone system and method of assembling the same|

GB2540175A|2015-07-08|2017-01-11|Nokia Technologies Oy|Spatial audio processing apparatus|

US10397711B2|2015-09-24|2019-08-27|Gn Hearing A/S|Method of determining objective perceptual quantities of noisy speech signals|

JP6569945B2|2016-02-10|2019-09-04|日本電信電話株式会社|Binaural sound generator, microphone array, binaural sound generation method, program|

KR102357287B1|2016-03-15|2022-02-08|프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝에. 베.|Apparatus, Method or Computer Program for Generating a Sound Field Description|

CN106842111B|2016-12-28|2019-03-29|西北工业大学|Indoor sound localization method based on microphone mirror image|

US10367948B2|2017-01-13|2019-07-30|Shure Acquisition Holdings, Inc.|Post-mixing acoustic echo cancellation systems and methods|

FR3069693B1|2017-07-28|2019-08-30|Arkamys|METHOD AND SYSTEM FOR PROCESSING AUDIO SIGNAL INCLUDING ENCODING IN AMBASSIC FORMAT|

CN112385245B|2018-07-16|2022-02-25|西北工业大学|Flexible geographically distributed differential microphone array and associated beamformer|

JP2020150490A|2019-03-15|2020-09-17|本田技研工業株式会社|Sound source localization apparatus, sound source localization method, and program|

WO2021044551A1|2019-09-04|2021-03-11|日本電信電話株式会社|Arrival direction estimating device, model learning device, arrival direction estimating method, model learning method, and program|

法律状态:
2018-12-18| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-10-01| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2020-10-13| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2020-12-29| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 26/10/2011, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US40757410P| true| 2010-10-28|2010-10-28|

US61/407,574|2010-10-28|

EP11166916.4|2011-05-20|

EP11166916A|EP2448289A1|2010-10-28|2011-05-20|Apparatus and method for deriving a directional information and computer program product|

PCT/EP2011/068805|WO2012055940A1|2010-10-28|2011-10-26|Apparatus and method for deriving a directional information and computer program product|

[返回顶部]